Decoder-only language models can be great at making user profiles (embeddings), but how we let them look at the sequence—called attention masking—changes how smart those profiles are.
LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.
The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.
LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.
OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).
Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.
VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.
InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.
This paper shows how to get strong text embeddings from decoder-only language models without any training.
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.
Repeating the entire prompt once (QUERY→QUERY+QUERY) helps many large language models answer better when you are not asking them to show their reasoning.
Autoregressive (AR) models normally write one token at a time, which is accurate but slow for long answers.