🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers11

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#causal attention

LIVE: Long-horizon Interactive Video World Modeling

Intermediate
Junchao Huang, Ziyang Ye et al.Feb 3arXiv

LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.

#cycle consistency#autoregressive video diffusion#exposure bias

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Intermediate
Hongzhou Zhu, Min Zhao et al.Feb 2arXiv

The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.

#autoregressive video diffusion#causal attention#ODE distillation

Advancing Open-source World Models

Intermediate
Robbyant Team, Zelin Gao et al.Jan 28arXiv

LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.

#world model#video diffusion#causal attention

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Intermediate
Pengze Zhang, Yanze Wu et al.Jan 20arXiv

OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).

#spatio-temporal video transfer#identity transfer#style transfer

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Intermediate
Hyunjong Ok, Jaeho LeeJan 20arXiv

Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.

#causal attention#prompt order sensitivity#multiple-choice question answering

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Intermediate
Longbin Ji, Xiaoxiong Liu et al.Jan 9arXiv

VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.

#autoregressive video generation#visual autoregression#next-frame prediction

InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Intermediate
Shuai Yuan, Yantai Yang et al.Jan 5arXiv

InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.

#InfiniteVGGT#rolling memory#causal attention

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Intermediate
Yixuan Tang, Yi YangJan 3arXiv

This paper shows how to get strong text embeddings from decoder-only language models without any training.

#text embeddings#decoder-only LLMs#causal attention

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Intermediate
Yuanhang Li, Yiren Song et al.Dec 17arXiv

IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.

#video editing#visual effects#diffusion transformer

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Intermediate
Lanxiang Hu, Siqi Kou et al.Dec 16arXiv

Autoregressive (AR) models normally write one token at a time, which is accurate but slow for long answers.

#Jacobi Forcing#Jacobi decoding#consistency distillation

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Intermediate
Jia-Nan Li, Jian Guan et al.Dec 15arXiv

ReFusion is a new way for AI to write text faster by planning in chunks (called slots) and then filling each chunk carefully.

#ReFusion#masked diffusion model#parallel decoding