RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).
Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.
This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.
LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.
Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.
Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.
Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.
Robots used to explore by following simple rules or short-term rewards, which often made them waste time and backtrack a lot.
Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.
This paper is the first big map of how AI can fix real software problems, not just write short code snippets.
STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.
This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.