Reinforcement learning (RL) trains language models by letting them try answers and learn from rewards, but training is slow if we pick the wrong practice questions.
This paper proposes ReSID, a new way to turn items into short token codes (Semantic IDs) that are much easier for a recommender to predict.
This paper turns messy chains of thought from language models into clear, named steps so we can see how they really think through math problems.