Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.
Vector Prism helps computers animate SVG images by first discovering which tiny shapes belong together as meaningful parts.
SS4D is a new AI model that turns a short single-camera video into a full 3D object that moves over time (that’s 4D), and it does this in about 2 minutes.
Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.
Reinforcement learning agents often see the world in straight, flat space (Euclidean), but many decision problems look more like branching trees that fit curved, hyperbolic space better.
SonicMoE makes Mixture-of-Experts (MoE) models train faster and use less memory by redesigning how data is moved and computed on GPUs.
Autoregressive (AR) models write one word at a time, which is accurate but slow, especially when your computer or GPU can’t keep many tasks in memory at once.
HyperVL is a small but smart model that understands images and text, designed to run fast on phones and tablets.
OpenDataArena (ODA) is a fair, open platform that measures how valuable different post‑training datasets are for large language models by holding everything else constant.
FINERWEB is a new, carefully built dataset pipeline that teaches computers to spot names of people, places, and more across 91 languages and 25 writing systems.
SAGE is a smart video-watching agent that decides when to answer quickly and when to take multiple steps, just like how people skim or rewind long videos.
LitePT is a new AI backbone for 3D point clouds that uses convolutions in early layers and attention in later layers to be both fast and accurate.