Robots used to explore by following simple rules or short-term rewards, which often made them waste time and backtrack a lot.
KAGE-Bench is a fast, carefully controlled benchmark that tests how well reinforcement learning (RL) agents trained on pixels handle specific visual changes, like new backgrounds or lighting, without changing the actual game rules.
The paper introduces Intervention Training (InT), a simple way for a language model to find and fix the first wrong step in its own reasoning using a short, targeted correction.
This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.
The paper solves a big problem: when you merge several reinforcement-learned models, their special skills get watered down by simple averaging.
Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.
This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
RL-trained search agents often sound confident even when they don’t know, which can mislead people.
This paper is the first big map of how AI can fix real software problems, not just write short code snippets.
Cities are full of places defined by people, like schools and parks, which are hard to see clearly from space without extra clues.
STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.
SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.