This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.
This paper teaches a vision-language model to first find objects in real 3D space (not just 2D pictures) and then reason about where things are.
StageVAR makes image-generating AI much faster by recognizing that early steps set the meaning and structure, while later steps just polish details.
The paper defines Scientific General Intelligence (SGI) as an AI that can do science like a human scientist across the full loop: study, imagine, test, and understand.
This paper builds a big, fair test called Hearing to Translate to check how well different speech translation systems work in the real world.
This paper organizes how AI agents learn and improve into one simple map with four roads: A1, A2, T1, and T2.
This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.
INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (about 12B active per token) trained with large-scale reinforcement learning and it beats many bigger models on math, coding, science, and reasoning tests.
ModelTables is a giant, organized collection of tables that describe AI models, gathered from Hugging Face model cards, GitHub READMEs, and research papers.
TurboDiffusion speeds up video diffusion models by about 100–200 times while keeping video quality comparable.
This paper asks whether we are judging AI answers the right way and introduces Sage, a new way to test AI judges without using human-graded answers.
Spatia is a video generator that keeps a live 3D map of the scene (a point cloud) as its memory while making videos.