This paper builds InternGeometry, a large language model agent that solves Olympiad-level geometry by talking to a math engine, remembering what worked, and trying smart new ideas.
Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.
MentraSuite is a complete toolkit that teaches large language models (LLMs) to reason about mental health step by step, not just sound caring.
The paper shows that video AIs do not need long, human-like chains of thought to reason well.
Diffusion language models write by gradually unmasking hidden words, so deciding which blanks to reveal next is a big deal for both speed and accuracy.
TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.
The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.
EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.
Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the “safe zone,” causing unstable learning.
Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.
This paper teaches image models to keep things consistent across multiple pictures—like the same character, art style, and story logic—using reinforcement learning (RL).
RealGen is a new way to make computer-made pictures look so real that they can fool expert detectors and even careful judges.