Papers1262

Towards a Science of Scaling Agent Systems

Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.

#multi-agent systems#agentic evaluation#scaling laws

Not triaged yet

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Intermediate

Yexin Liu, Manyuan Zhang et al.Dec 9arXiv

OpenSubject is a giant video-based dataset (2.5M samples, 4.35M images) built to help AI make pictures that keep each person or object looking like themselves, even in busy scenes.

#subject-driven generation#identity fidelity#video-derived dataset

Not triaged yet

EgoX: Egocentric Video Generation from a Single Exocentric Video

Intermediate

Taewoong Kang, Kinam Kim et al.Dec 9arXiv

EgoX turns a regular third-person video into a first-person video that looks like it was filmed from the actor’s eyes.

#egocentric video generation#exocentric to egocentric#video diffusion models

Not triaged yet

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

Intermediate

Meng Wei, Chenyang Wan et al.Dec 9arXiv

Robots that follow spoken instructions used to be slow and jerky because one big model tried to think and move at the same time.

#vision-and-language navigation#VLM planner#dual-system architecture

Not triaged yet

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Intermediate

Zheng Ding, Weirui YeDec 9arXiv

TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.

#TreeGRPO#reinforcement learning#diffusion models

Not triaged yet

Position: Universal Aesthetic Alignment Narrows Artistic Expression

Intermediate

Wenqi Marshall Guo, Qingyun Qian et al.Dec 9arXiv

The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.

#universal aesthetic alignment#aesthetic pluralism#reward models

Not triaged yet

Relational Visual Similarity

Intermediate

Thao Nguyen, Sicheng Mo et al.Dec 8arXiv

Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.

Not triaged yet

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Beginner

Jiehui Huang, Yuechen Zhang et al.Dec 8arXiv

UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.

#multimodal video generation#multi-task learning#dynamic noise scheduling

Not triaged yet

One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Intermediate

Yuan Gao, Chen Chen et al.Dec 8arXiv

This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.

#Feature Auto-Encoder#FAE#Self-Supervised Learning

Not triaged yet

Group Representational Position Encoding

Intermediate

Yifan Zhang, Zixiang Chen et al.Dec 8arXiv

GRAPE is a new way to tell Transformers where each word is in a sentence by using neat math moves called group actions.

#GRAPE#positional encoding#group actions

Not triaged yet

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Beginner

Zhaochong An, Menglin Jia et al.Dec 8arXiv

OneStory is a new way to make long videos from many shots that stay consistent with the story, characters, and places across time.

#multi-shot video generation#adaptive memory#frame selection

Not triaged yet

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Beginner

Charlie Zhang, Graham Neubig et al.Dec 8arXiv

The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.

#edge of competence#process-verified evaluation#process-level rewards

Not triaged yet

99 100 101 102 103