FutureOmni is the first benchmark that tests if multimodal AI models can predict what happens next from both sound and video, not just explain what already happened.
TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.
JavisGPT is a single AI that can both understand sounding videos (audio + video together) and also create new ones that stay in sync.
Streamo is a real-time video assistant that knows when to stay quiet, when to wait, and when to speak—while a video is still playing.
OpenDataArena (ODA) is a fair, open platform that measures how valuable different post‑training datasets are for large language models by holding everything else constant.
DentalGPT is a special AI that looks at dental images and text together and explains what it sees like a junior dentist.
Time-series data are numbers tracked over time, like temperature each hour or traffic each day, and turning them into clear words usually needs experts.
Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.
InfiniteVL is a vision-language model that mixes two ideas: local focus with Sliding Window Attention and long-term memory with a linear module called Gated DeltaNet.
Diffusion language models (dLLMs) can write all parts of an answer in parallel, but they usually take many tiny cleanup steps, which makes them slow.