Tool-R0 teaches a language model to use software tools (like APIs) with zero human-made training data.
Sci-CoE is a two-stage training method that helps one language model learn to both solve science problems and check those solutions with very little labeled data.
TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.