MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.
Reasoning Core is a tool that automatically creates a huge variety of logic and math puzzles, checks every answer with real solvers, and lets you smoothly dial the difficulty up or down.
Tool-R0 teaches a language model to use software tools (like APIs) with zero human-made training data.
This paper shows, step by step, how to train a 1.36-billion-parameter science-focused language model directly from raw arXiv LaTeX files using only 2 A100 GPUs.
TAROT teaches code-writing AI the way good teachers teach kids: start at the right level and raise the bar at the right time.
This paper introduces P-GenRM, a personalized generative reward model that judges AI answers using a custom scorecard built just for each user and situation.
The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.
Big language models can get stuck after fine-tuning because they become too sure of themselves, so normal training stops helping.
Agents in vast, open-ended games often learn a little and then get stuck because the next good practice steps are missing.
V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.
The paper finds a hidden symmetry inside GRPO’s advantage calculation that accidentally stops models from exploring new good answers and from paying the right attention to easy versus hard problems at the right times.
LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.