Reinforcement learning (RL) trains language models by letting them try answers and learn from rewards, but training is slow if we pick the wrong practice questions.
ReGFT is a simple pre-RL step that shows the model partial human hints, then makes it solve problems in its own words, creating correct, model-style solutions for hard questions.
Large multimodal models (LMMs) can look at pictures and read text, but they still miss tricky cases, like tiny chart labels or multi-step math.
GigaBrain-0.5M* is a robot brain that sees, reads, and acts, and it gets smarter by imagining the future before moving.
The paper teaches language models to explore more ideas while thinking, so they can solve harder problems.
RISE lets a robot learn safely and cheaply by practicing in its imagination instead of always in the real world.
Search engines on social apps used to rely on many separate mini-models that often misunderstood slang and were hard to keep updated.
This paper teaches a language model to improve its own math answers by first writing several drafts and then learning to beat its best draft.
The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.
The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.
Large language models learn better when we spend more practice time on the right questions at the right moments.
The paper shows that a model that looks great after supervised fine-tuning (SFT) can actually do worse after the same reinforcement learning (RL) than a model that looked weaker at SFT time.