ERNIE 5.0 is a single giant model that can read and create text, images, video, and audio by predicting the next pieces step by step, like writing a story one line at a time.
LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.
JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.
Video models can now be told what physical result you want (like “make this ball move left with a strong push”) using Goal Force, instead of just vague text or a final picture.
Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.
The paper introduces Canon layers, tiny add-ons that let nearby words share information directly, like passing notes along a row of desks.
Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.
ProPhy is a new two-step method that helps video AIs follow real-world physics, not just make pretty pictures.