LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.
The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.
Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.
This paper fixes a common problem in video-making AIs where tiny mistakes snowball over time and ruin long videos.
LLM judges are cheap but biased; without calibration they can completely flip which model looks best.
This paper fixes two big problems in image-making AI that builds pictures step by step: it often practices with perfect answers (teacher forcing) but must perform using its own imperfect guesses later, and the earliest coarse steps are much harder than the later fine steps.
Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.