DFlash is a new way to make big language models answer much faster without changing the final answers.
The paper shows how to speed up reinforcement learning (RL) for large language models (LLMs) by making numbers smaller (FP8) without breaking training.
ThreadWeaver teaches a language model to split big problems into smaller parts it can solve at the same time, like teammates working in parallel.