🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers31

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#LoRA fine-tuning

Latent Implicit Visual Reasoning

Intermediate
Kelvin Li, Chuyi Shang et al.Dec 24arXiv

Large Multimodal Models (LMMs) are great at reading text and looking at pictures, but they usually do most of their thinking in words, which limits deep visual reasoning.

#Latent Implicit Visual Reasoning#latent tokens#bottleneck attention masking

StoryMem: Multi-shot Long Video Storytelling with Memory

Intermediate
Kaiwen Zhang, Liming Jiang et al.Dec 22arXiv

StoryMem is a new way to make minute‑long, multi‑shot videos that keep the same characters, places, and style across many clips.

#StoryMem#Memory-to-Video#multi-shot video generation

Region-Constraint In-Context Generation for Instructional Video Editing

Intermediate
Zhongwei Zhang, Fuchen Long et al.Dec 19arXiv

ReCo is a new way to edit videos just by telling the computer what to change with words, no extra masks needed.

#instruction-based video editing#in-context generation#region constraint

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Intermediate
Hoiyeong Jin, Hyojin Jang et al.Dec 19arXiv

InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.

#video object insertion#4D scene geometry#diffusion video generation

Animate Any Character in Any World

Intermediate
Yitong Wang, Fangyun Wei et al.Dec 18arXiv

AniX is a system that lets you place any character into any 3D world and control them with plain language, like “run forward” or “play a guitar.”

#AniX#3D Gaussian Splatting#world models

EgoX: Egocentric Video Generation from a Single Exocentric Video

Intermediate
Taewoong Kang, Kinam Kim et al.Dec 9arXiv

EgoX turns a regular third-person video into a first-person video that looks like it was filmed from the actor’s eyes.

#egocentric video generation#exocentric to egocentric#video diffusion models

Relational Visual Similarity

Intermediate
Thao Nguyen, Sicheng Mo et al.Dec 8arXiv

Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.

#relational similarity#visual analogy#anonymous captions
123