WideSeek-R1 teaches a small 4B-parameter language model to act like a well-run team: one leader plans, many helpers work in parallel, and everyone learns together with reinforcement learning.
This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.
Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.
Large language models (LLMs) are good at many math problems but often mess up simple counting when the list gets long.
WebGym is a giant practice world (almost 300,000 tasks) that lets AI web agents learn on real, ever-changing websites instead of tiny, fake ones.