Agentic AIs don’t just chat; they plan, use tools, and take many steps, so one wrong click can cause real harm.
This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.
This paper introduces HACRL, a way for different kinds of AI agents to learn together during training but still work alone during use.
This paper teaches AI models to judge how sure they are about an answer and to think again if they are not sure.
CharacterFlywheel is a step‑by‑step loop that steadily improves chatty AI characters by learning from real conversations on Instagram, WhatsApp, and Messenger.
Reinforcement learning (RL) trains language models by letting them try answers and learn from rewards, but training is slow if we pick the wrong practice questions.
FireRed-OCR turns a general vision-language model into a careful document reader that follows strict rules, so its outputs are usable in the real world.
Longer explanations are not always better; the shape of thinking matters.
This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?
This paper teaches image generators to place objects in the right spots by building a special teacher called a reward model focused on spatial relationships.
SLATE is a new way to teach AI to think step by step while using a search engine, giving feedback at each step instead of only at the end.
This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.