Reasoning models often talk too much, and those extra words can actually make them more wrong.
This paper speeds up and improves AI image editing by giving hard edits more attention and easy edits less, just like a smart coach.
The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.
MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.
JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.
When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.