This paper tackles why training AI agents that act over many steps (like browsing the web or moving in a house) often becomes unstable and collapses.
This paper teaches a model to turn a question about a table into both a short answer and a clear, correct chart.
RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.