This paper teaches vision-language models to reason about pictures using puzzles instead of expensive human labels.
CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.
Robots usually learn by copying many demonstrations, which is expensive and makes them brittle when things change.
SAGE is a smart video-watching agent that decides when to answer quickly and when to take multiple steps, just like how people skim or rewind long videos.
Diffusion Preview is a two-step “preview-then-refine” workflow that shows you a fast draft image first and only spends full compute after you like the draft.
ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.
QwenLong-L1.5 is a training recipe that helps AI read and reason over very long documents by improving the data it learns from, the way it is trained, and how it remembers important stuff.
DentalGPT is a special AI that looks at dental images and text together and explains what it sees like a junior dentist.
This paper builds a math problem–solving agent, Intern-S1-MO, that thinks in multiple rounds and remembers proven mini-results called lemmas so it can solve very long, Olympiad-level problems.
This paper builds InternGeometry, a large language model agent that solves Olympiad-level geometry by talking to a math engine, remembering what worked, and trying smart new ideas.
Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.
MentraSuite is a complete toolkit that teaches large language models (LLMs) to reason about mental health step by step, not just sound caring.