Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.
This paper shows that many reasoning failures in AI are caused by just a few distracting words in the prompt, not because the problems are too hard.
This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.
LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.
Typhoon-S is a simple, open recipe that turns a basic language model into a helpful assistant and then teaches it important local skills, all on small budgets.
Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.
Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.
Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.
Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.
The paper teaches an AI to act like a careful traveler: it looks at a photo, forms guesses about where it might be, and uses real map tools to check each guess.
Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.
Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).