HySparse is a new way for AI models to pay attention that mixes a few full attention layers with many fast, memory‑saving sparse layers.
This paper speeds up how AI models read very long texts by carefully choosing which words (tokens) to focus on at each step.
The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.
AI programs called LLMs can now help write the tiny, super-fast pieces of code (kernels) that make GPUs run AI models efficiently.
HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.
Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.
FOCUSUI makes computer-using AI faster and still accurate by looking only at the important parts of a screen.
InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.
CASA is a new way to mix images and text inside a language model that keeps speed and memory low while keeping accuracy high.
Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.
This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.