MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.
This paper shows, step by step, how to train a 1.36-billion-parameter science-focused language model directly from raw arXiv LaTeX files using only 2 A100 GPUs.
TAROT teaches code-writing AI the way good teachers teach kids: start at the right level and raise the bar at the right time.
This paper introduces P-GenRM, a personalized generative reward model that judges AI answers using a custom scorecard built just for each user and situation.
The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.
Big language models can get stuck after fine-tuning because they become too sure of themselves, so normal training stops helping.
Agents in vast, open-ended games often learn a little and then get stuck because the next good practice steps are missing.
V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.
The paper finds a hidden symmetry inside GRPO’s advantage calculation that accidentally stops models from exploring new good answers and from paying the right attention to easy versus hard problems at the right times.
LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.
When rewards are rare, a popular training method for language models (GRPO) often stops learning because every try in a group gets the same score, so there is nothing to compare.
CoDiQ is a recipe for making hard-but-solvable math and coding questions on purpose, and it controls how hard they get while you generate them.