The paper teaches large language models to learn from detailed feedback (like error messages) instead of only a simple pass/fail score.
SERA is a new, low-cost way to train coding helpers (agents) that learn the style and secrets of your own codebase.
AgentLongBench is a new test that checks how well AI agents think over very long stories made of their own actions and the world's replies, not just by reading static documents.
This paper says that to make math-solving AIs smarter, we should train them more on the hardest questions they can almost solve.
This paper builds a new test called AgentIF-OneDay that checks if AI helpers can follow everyday instructions the way people actually give them.
DeepSeek-OCR 2 teaches a computer to “read” pictures of documents in a smarter order, more like how people read.
LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.
WorldVQA is a new test that checks if multimodal AI models can correctly name what they see in pictures without doing extra reasoning.
This paper finds that about 1 out of every 4 attention heads in autoregressive video diffusion models mostly looks only at the current frame and almost ignores the past, wasting memory and time.
OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.
Text-to-image models draw pretty pictures, but often put things in the wrong places or miss how objects interact.
DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.