Papers1055

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Taekyung Ki, Sangwon Jang et al.Jan 2arXiv

This paper builds a real-time talking-listening head avatar that reacts naturally to your words, tone, nods, and smiles in about half a second.

#interactive avatar#talking head generation#causal diffusion forcing

Not triaged yet

CPPO: Contrastive Perception for Vision Language Policy Optimization

Intermediate

Ahmad Rezaei, Mohsen Gholami et al.Jan 1arXiv

CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.

#CPPO#Contrastive Perception Loss#Vision-Language Models

Not triaged yet

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Intermediate

Shengjun Zhang, Zhang Zhang et al.Jan 1arXiv

This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.

#E-GRPO#Group Relative Policy Optimization#Flow Matching

Not triaged yet

Deep Delta Learning

Intermediate

Yifan Zhang, Yifeng Liu et al.Jan 1arXiv

Deep Delta Learning (DDL) replaces the usual “add the shortcut” rule in deep networks with a smarter, learnable move that can gently erase old info and write new info along a chosen direction.

#Deep Delta Learning#Delta Operator#Residual connection

Not triaged yet

MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing

Intermediate

Xiaokun Sun, Zeyu Cai et al.Jan 1arXiv

MorphAny3D is a training-free way to smoothly change one 3D object into another, even if they are totally different (like a bee into a biplane).

#3D morphing#Structured Latent#SLAT

Not triaged yet

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Intermediate

Yi-Chuan Huang, Hao-Jen Chien et al.Dec 31arXiv

GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.

#3D reconstruction#outpainting#multi-view diffusion

Not triaged yet

Scaling Open-Ended Reasoning to Predict the Future

Intermediate

Nikhil Chandak, Shashwat Goel et al.Dec 31arXiv

The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.

#open-ended forecasting#calibrated prediction#Brier score

Not triaged yet

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands

Intermediate

Siyuan Hu, Kevin Qinghong Lin et al.Dec 31arXiv

Computers usually click like a woodpecker, but they struggle to drag smoothly like a human hand; this paper fixes that.

#GUI automation#continuous control#flow matching

Not triaged yet

BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts

Intermediate

Hengli Li, Zhaoxin Yu et al.Dec 31arXiv

This paper presents BEDA, a simple way to make chatty AI act strategically by turning what it believes into gentle rules (probabilistic constraints) that guide what it can say.

#strategic dialogue#belief estimation#probabilistic constraints

Not triaged yet

mHC: Manifold-Constrained Hyper-Connections

Intermediate

Zhenda Xie, Yixuan Wei et al.Dec 31arXiv

The paper fixes a stability problem in Hyper-Connections (HC) by gently steering the network’s mixing matrix onto a safe shape (a manifold) where signals don’t blow up or vanish.

#Residual Connections#Hyper-Connections#Manifold Projection

Not triaged yet

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Intermediate

Weixun Wang, XiaoXiao Xu et al.Dec 31arXiv

This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.

#agentic LLMs#reinforcement learning#IPA

Not triaged yet

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Intermediate

Karthik Dharmarajan, Wenlong Huang et al.Dec 31arXiv

Dream2Flow lets a robot watch a short, AI-generated video of a task and then do that task in real life by following object motion in 3D.

#3D object flow#video generation for robotics#open-world manipulation

Not triaged yet

61 62 63 64 65