This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.
DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.
HY3D-Bench is a complete, open-source “starter kit” for making and studying high-quality 3D objects.
HySparse is a new way for AI models to pay attention that mixes a few full attention layers with many fast, memory‑saving sparse layers.
The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.
A-RAG lets the AI choose how to search, what to read, and when to stop, instead of following a fixed recipe.
SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.
SWE-Master is a fully open, step-by-step recipe for turning a regular coding model into a strong software-fixing agent that works across many steps, files, and tests.
The paper builds a simple, math-light rule to predict whether training makes a language model more open-minded (higher entropy) or more sure of itself (lower entropy).
MeKi is a new way to grow a language model’s knowledge by using storage (ROM) instead of extra heavy calculations (FLOPs).
The paper shows that even if a model is great at predicting when an AI agent will fail, jumping in to “fix” the agent mid-task can still make things worse.
This paper speeds up how AI models read very long texts by carefully choosing which words (tokens) to focus on at each step.