This paper builds a live challenge that tests how well Deep Research Agents (DRAs) can write expert-level Wikipedia-style articles.
This paper shows a new way to predict what a phone screen will look like after you tap or scroll: generate web code (like HTML/CSS/SVG) and then render it to pixels.
FS-Researcher is a two-agent system that lets AI do very long research by saving everything in a computer folder so it never runs out of memory.
This paper teaches multimodal AI models to not just read pictures but to also imagine and think with pictures inside their heads.
This paper teaches talking avatars not just to speak, but to look around their scene and handle nearby objects exactly as a text instruction says.
The paper introduces Rubric-ARM, a system that teaches two AI helpers—a rubric maker and a judge—to learn together using reinforcement learning so they can better decide which answers people would prefer.
EBISU is a new test that checks how well AI models understand Japanese finance, a language and domain where hints and special terms are common.
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.
This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.
This paper teaches AI to copy the hidden idea inside a picture (a visual metaphor) and reuse that idea on a brand‑new subject.
PolySAE is a new kind of sparse autoencoder that keeps a simple, linear way to find features but uses a smarter decoder that can multiply features together.