🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Rectified Flow

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Intermediate
Hao Luo, Ye Wang et al.Jan 19arXiv

Being-H0.5 is a robot brain that learns from huge amounts of human videos and robot demos so it can work on many different robots, not just one.

#Vision-Language-Action model#Unified Action Space#Human-centric learning

Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

Beginner
Runze He, Yiji Cheng et al.Jan 8arXiv

Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.

#In-Context Image Generation#Reference-based Image Editing#Structured Reasoning

Choreographing a World of Dynamic Objects

Intermediate
Yanzhe Lyu, Chen Geng et al.Jan 7arXiv

CHORD is a new way to animate 3D scenes over time (4D) where many objects move and interact, guided only by a text prompt.

#4D generation#Rectified Flow#Score Distillation Sampling

Bridging Your Imagination with Audio-Video Generation via a Unified Director

Intermediate
Jiaxu Zhang, Tianshu Hu et al.Dec 29arXiv

UniMAGE is a single “director” AI that writes a film-like script and draws the key pictures for each shot, so stories stay clear and characters look the same from scene to scene.

#Unified Director Model#Mixture-of-Transformers#Interleaved Concept Learning

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate
Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

RecTok: Reconstruction Distillation along Rectified Flow

Intermediate
Qingyu Shi, Size Wu et al.Dec 15arXiv

RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.

#Rectified Flow#Flow Matching#Visual Tokenizer