This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.
This paper teaches AI to copy the hidden idea inside a picture (a visual metaphor) and reuse that idea on a brand‑new subject.
PolySAE is a new kind of sparse autoencoder that keeps a simple, linear way to find features but uses a smarter decoder that can multiply features together.
The paper shows that a model that looks great after supervised fine-tuning (SFT) can actually do worse after the same reinforcement learning (RL) than a model that looked weaker at SFT time.
Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.
The paper discovers a tiny, special group of neurons inside large language models (LLMs) that act like a reward system in the human brain.
Green-VLA is a step-by-step training recipe that teaches one model to see, understand language, and move many kinds of robots safely and efficiently.
This paper teaches a model to make its own helpful hints (sub-questions) and then use those hints to learn better with reinforcement learning that checks answers automatically.
Training big language models works best when you mix the right kinds of data (general, math, code), but finding the best mix used to be slow and very expensive.
Big AI models do great in the lab but stumble in the real world because the world keeps changing.
VoxServe is a new serving system that makes voice AIs respond fast and smoothly when streaming audio to users.
PaperBanana is a team of AI helpers that turns a paper’s method text and caption into a clean, accurate, publication-ready figure.