This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.
EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.
PaCoRe is a way for AI to think in many parallel paths and then coordinate them, so it can use a lot more brainpower at test time without running out of context window space.
This paper teaches an AI model to understand both which way an object is facing (orientation) and how it turns between views (rotation), all in one system.
FinVault is a new test that checks if AI helpers for finance stay safe while actually doing real jobs, not just chatting.
The paper shows that language models with a search tool often look up too much information, which wastes compute and can make answers worse on unanswerable questions.
The paper teaches an AI to act like a careful traveler: it looks at a photo, forms guesses about where it might be, and uses real map tools to check each guess.
This paper builds MFMD-Scen, a big test to see how AI changes its truth/false judgments about the same money-related claim when the situation around it changes.
This paper teaches a camera to fix nighttime colors by combining a smart rule-based color trick (SGP-LRD) with a learning-by-trying helper (reinforcement learning).
When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.
RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.
PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.