This paper says today's content AIs are great at pretty pictures and videos but often miss what people actually want, creating a big Intent-Execution Gap.
Binary right/wrong rewards for training reasoning in large language models are hard to design and often too sparse to learn from.
AOrchestra is like a smart conductor that builds the right mini-helpers (sub-agents) on demand to solve big, multi-step tasks.
CL-bench is a new test that checks whether AI can truly learn new things from the information you give it right now, not just from what it memorized before.
RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).
The paper tackles a new kind of search called Wide Research, where an AI must gather lots of related facts under complex rules and put them into a clean table.
Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.
WildGraphBench is a new test that checks how well GraphRAG systems find and combine facts from messy, real-world web pages.
This paper builds a live challenge that tests how well Deep Research Agents (DRAs) can write expert-level Wikipedia-style articles.
This paper teaches talking avatars not just to speak, but to look around their scene and handle nearby objects exactly as a text instruction says.
PolySAE is a new kind of sparse autoencoder that keeps a simple, linear way to find features but uses a smarter decoder that can multiply features together.
PaperBanana is a team of AI helpers that turns a paper’s method text and caption into a clean, accurate, publication-ready figure.