This paper introduces MMSI-Video-Bench, a big, carefully hand-made test to check how well AI understands space and motion in videos.
This paper studies how a newer kind of language model, called a discrete diffusion language model (DLM), gets better as we give it more data, bigger models, and more compute.
This paper asks whether generation training benefits more from an encoder’s big-picture meaning (global semantics) or from how features are arranged across space (spatial structure).
The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.
Big AI models often write very long step-by-step solutions, but usual checkers either only check the final answer or get lost in the long steps.
This paper builds a math problem–solving agent, Intern-S1-MO, that thinks in multiple rounds and remembers proven mini-results called lemmas so it can solve very long, Olympiad-level problems.
SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.
Diffusion models sometimes copy training images too closely, which can be a privacy and copyright problem.
LEO-RobotAgent is a simple but powerful framework that lets a language model think, plan, and operate many kinds of robots using natural language.
This paper builds InternGeometry, a large language model agent that solves Olympiad-level geometry by talking to a math engine, remembering what worked, and trying smart new ideas.
T-pro 2.0 is an open Russian language model that can answer quickly or think step by step, so you can pick speed or accuracy when you need it.
Long texts make standard attention in large language models very slow because it checks every word against every other word.