Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.
JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.
Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.