This paper speeds up and improves AI image editing by giving hard edits more attention and easy edits less, just like a smart coach.
The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.
MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.
JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.
When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.