Reasoning models often talk too much, and those extra words can actually make them more wrong.
MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.
CHIMERA is a small (about 9,000 examples) but very carefully built synthetic dataset that teaches AI to solve hard problems step by step.
The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.
The paper asks a simple question: which kind of step-by-step reasoning helps small language models learn best, and why?
This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.
Giving large language models a few good examples and step-by-step instructions can make them much better at spotting feelings in text.
LLMs can look confident but still change their answers when the surrounding text nudges them, showing that confidence alone isn’t real truthfulness.
Big reasoning AIs think in many steps, which is slow and costly.
ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.
Large reasoning models can often find the right math answer in their “head” before finishing their written steps, but this works best in languages with lots of training data like English and Chinese.
This paper shows a new way (called RISE) to find and control how AI models think without needing any human-made labels.