The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.
This paper teaches AI agents to make smart choices about when to explore for more information and when to act right away.
Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.
This paper builds GUI-Owl-1.5, an AI that can use phones, computers, and web browsers like a careful human helper.
Sci-CoE is a two-stage training method that helps one language model learn to both solve science problems and check those solutions with very little labeled data.
This paper introduces P-GenRM, a personalized generative reward model that judges AI answers using a custom scorecard built just for each user and situation.
DataChef teaches a large language model to be a smart data chef: it plans and codes full data pipelines that turn messy datasets into great training meals for other models.
Long texts overwhelm many language models, which forget important bits and slow down as the context grows.
This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.
This paper teaches a computer to find buttons, text, and icons on screens so it can click and type in the right places, a skill called GUI grounding.
V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.
BudgetMem is a way for AI helpers to build and use memory on the fly, picking how much thinking to spend so answers are both good and affordable.