Papers6

#Rectified Flow

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Being-H0.5 is a robot brain that learns from huge amounts of human videos and robot demos so it can work on many different robots, not just one.

#Vision-Language-Action model#Unified Action Space#Human-centric learning

Not triaged yet

Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

Beginner

Runze He, Yiji Cheng et al.Jan 8arXiv

Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.

#In-Context Image Generation#Reference-based Image Editing#Structured Reasoning

Not triaged yet

Choreographing a World of Dynamic Objects

Intermediate

Yanzhe Lyu, Chen Geng et al.Jan 7arXiv

CHORD is a new way to animate 3D scenes over time (4D) where many objects move and interact, guided only by a text prompt.

#4D generation#Rectified Flow#Score Distillation Sampling

Not triaged yet

Bridging Your Imagination with Audio-Video Generation via a Unified Director

Intermediate

Jiaxu Zhang, Tianshu Hu et al.Dec 29arXiv

UniMAGE is a single “director” AI that writes a film-like script and draws the key pictures for each shot, so stories stay clear and characters look the same from scene to scene.

#Unified Director Model#Mixture-of-Transformers#Interleaved Concept Learning

Not triaged yet

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate

Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

Not triaged yet

RecTok: Reconstruction Distillation along Rectified Flow

Intermediate

Qingyu Shi, Size Wu et al.Dec 15arXiv

RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.

#Rectified Flow#Flow Matching#Visual Tokenizer

Not triaged yet