Phi-4-reasoning-vision-15B is a small, open-weight AI that understands pictures and text together and is especially good at math, science, and using computer screens.
SWE-CI is a new benchmark that tests how well AI coding agents can keep a codebase healthy over many changes, not just fix one bug.
This paper shows that teaching AI to first draw a simple map of a text (nodes and links) before answering questions makes it smarter and more reliable.
Scientists want AI to propose brand‑new hypotheses directly from a research background, but training a model to do this end‑to‑end is mathematically intractable because the search space explodes combinatorially.
InfinityStory is a new system that can make very long videos (even hours) where the world stays the same and characters transition smoothly between shots.
Utonia is a single brain (encoder) that learns from many kinds of 3D point clouds, like indoor rooms, outdoor streets, tiny toys, and even city maps.
MIBURI is a system that makes a talking digital character move its body and face expressively in real time while it speaks.
This paper turns a popular image-guidance trick (Classifier-Free Guidance) into a feedback-control problem, just like keeping a car steady in its lane.
The paper trains one model from scratch to both read text and see images/videos, instead of starting from a language-only model.
This paper builds UniG2U-Bench, a big test to find out when making pictures (generation) actually helps models understand pictures and text together.
Agentic AIs don’t just chat; they plan, use tools, and take many steps, so one wrong click can cause real harm.
This paper shows that code-writing AI agents can take an existing math problem and automatically turn it into a new, harder one while keeping it solvable.