Longer explanations are not always better; the shape of thinking matters.
The paper shows that, when teaching a reasoning AI with step-by-step examples, repeating a small set many times can beat using a huge set only once.
The paper shows that a model that looks great after supervised fine-tuning (SFT) can actually do worse after the same reinforcement learning (RL) than a model that looked weaker at SFT time.
Big models are often used to grade AI answers, but they are expensive, slow, and depend too much on tricky prompts.
Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.
OCRVerse is a new AI model that can read both plain text in documents and the visual structures in charts, webpages, and science plots, all in one system.
VisGym is a playground of 17 very different visual tasks that test and train AI models that see and talk (Vision–Language Models) to act over many steps.
This paper explains how to turn large language models (LLMs) from quiet students that only answer questions into active agents that can plan, act, and learn over time.
Big language models can learn new facts with simple tutoring (SFT), but that doesn’t automatically teach them how to use those facts well.
The paper studies why large language models (LLMs) sound too sure of themselves when using retrieval-augmented generation (RAG) and how to fix it.
Traditional supervised fine-tuning (SFT) makes a model copy one answer too exactly, which can cause overfitting to the exact wording instead of the real idea.
Falcon-H1R is a small (7B) AI model that thinks really well without needing giant computers.