ATLAS is a system that picks the best mix of AI models and helper tools for each question, instead of using just one model or a fixed tool plan.
Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.
ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.
The paper teaches language models using extra 'language homework' made from the same raw text so they learn grammar and meaning, not just next-word guessing.
This paper fixes a common problem in multimodal AI: models can understand pictures and words well but stumble when asked to create matching images.
This paper shows that training a language model with reinforcement learning on just one super well-designed example can boost reasoning across many school subjects, not just math.
Large reasoning models can often find the right math answer in their “head” before finishing their written steps, but this works best in languages with lots of training data like English and Chinese.
VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.
Falcon-H1R is a small (7B) AI model that thinks really well without needing giant computers.
OpenRT is a big, open-source test bench that safely stress-tests AI models that handle both text and images.
This paper introduces MOSS Transcribe Diarize, a single model that writes down what people say in a conversation, tells who said each part, and marks the exact times—all in one go.
SpaceTimePilot is a video AI that lets you steer both where the camera goes (space) and how the action plays (time) from one input video.