Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.
The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.
The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.
The paper shows how to make AI think faster and smarter by planning in a hidden space instead of writing long step-by-step sentences.
MemOCR is a new way for AI to remember long histories by turning important notes into a picture with big, bold parts for key facts and tiny parts for details.
This paper shows that many reasoning failures in AI are caused by just a few distracting words in the prompt, not because the problems are too hard.
This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.
When training smart language models with RL that use right-or-wrong rewards, learning can stall on 'saturated' problems that the model almost always solves.
The paper teaches large language models to learn from detailed feedback (like error messages) instead of only a simple pass/fail score.
This paper says that to make math-solving AIs smarter, we should train them more on the hardest questions they can almost solve.
OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.
DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.