The paper shows that a model that looks great after supervised fine-tuning (SFT) can actually do worse after the same reinforcement learning (RL) than a model that looked weaker at SFT time.
Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.
The paper discovers a tiny, special group of neurons inside large language models (LLMs) that act like a reward system in the human brain.
Green-VLA is a step-by-step training recipe that teaches one model to see, understand language, and move many kinds of robots safely and efficiently.
This paper teaches a model to make its own helpful hints (sub-questions) and then use those hints to learn better with reinforcement learning that checks answers automatically.
Training big language models works best when you mix the right kinds of data (general, math, code), but finding the best mix used to be slow and very expensive.
Big AI models do great in the lab but stumble in the real world because the world keeps changing.
VoxServe is a new serving system that makes voice AIs respond fast and smoothly when streaming audio to users.
PaperBanana is a team of AI helpers that turns a paper’s method text and caption into a clean, accurate, publication-ready figure.
This paper teaches AI teams to get better by scoring every move they make, not just the final answer.
Deep search agents can plan and browse the web in many steps, but they often fail because they don’t notice when their own thinking drifts off-track.
Chain-of-Thought (CoT) makes AI think step by step, but it is slow because it writes many tokens one by one.