MobilityBench is a big, carefully built test that checks how well AI helpers can plan real-world routes using natural language and map tools.
Tool-R0 teaches a language model to use software tools (like APIs) with zero human-made training data.
Searching through videos, images, and long documents is powerful but gets very expensive when every tiny piece is stored separately.
The paper introduces CHAIN, a hands-on 3D playground that tests if AI can not only see objects but also plan and act under real physics.
BBQ is a text-to-image model that lets you place objects exactly where you want using numeric bounding boxes and color them with exact RGB values.
NanoKnow is a new benchmark that checks whether a language model’s answers come from what it saw during training or from extra text we give it at question time.
This paper put real AI agents into a safe, live playground and asked expert testers to mess with them to see what breaks.
SkillOrchestra is a new way to make teams of AI models and tools work together by thinking in terms of skills, not just picking one big model for everything.
RoboCurate is a way to make better robot training videos by checking if the actions in a generated video actually match what a robot would do in a simulator.
OCR is like reading a page exactly as it is, and that strictness makes it perfect for fast, parallel generation.
AI helpers often don’t know new users’ tastes and can’t keep up when those tastes change.
The study tested how an in-car AI helper should talk while it works on long, multi-step tasks.