Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.
Large language models are usually trained to get good at one kind of reasoning, but real life needs them to be good at many things at once.
ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.
The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.
This paper teaches AI to pay attention better by training its focus, not just its words.
The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.
The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.
WideSeek-R1 teaches a small 4B-parameter language model to act like a well-run team: one leader plans, many helpers work in parallel, and everyone learns together with reinforcement learning.
The paper teaches multimodal large language models (MLLMs) to stop guessing from just text or just images and instead check both together before answering.
AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.
This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.
DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.