This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.
DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.
SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.
SWE-Master is a fully open, step-by-step recipe for turning a regular coding model into a strong software-fixing agent that works across many steps, files, and tests.
When rewards are rare, a popular training method for language models (GRPO) often stops learning because every try in a group gets the same score, so there is nothing to compare.
This paper shows that many hard math and AI problems can be solved with one shared idea called homotopy, where we move from an easy version of a problem to the real one step by step.
RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).
The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.
MemSkill turns memory operations for AI agents into learnable skills instead of fixed, hand-made rules.
SWE-Universe is a factory-like system that turns real GitHub pull requests into safe, repeatable coding practice worlds with automatic checkers.
Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.
LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.