Papers943

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.

#video face swapping#image face swapping#diffusion transformer

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

Intermediate

Rong Zhou, Dongping Chen et al.Jan 4arXiv

A digital twin is a living computer copy of a real thing (like a bridge, a heart, or a factory) that stays in sync with sensors and helps us predict, fix, and improve the real thing.

#digital twin#physics-informed AI#neural operators

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

Intermediate

Hansen Jin Lillemark, Benhao Huang et al.Jan 3arXiv

This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.

#flow equivariance#world model#partially observed environments

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Intermediate

Yixuan Tang, Yi YangJan 3arXiv

This paper shows how to get strong text embeddings from decoder-only language models without any training.

#text embeddings#decoder-only LLMs#causal attention

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Intermediate

Max Ruiz Luyten, Mihaela van der SchaarJan 2arXiv

Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.

#Distributional Creative Reasoning#diversity energy#creativity kernel

Fast-weight Product Key Memory

Intermediate

Tianyu Zhao, Llion JonesJan 2arXiv

The paper introduces Fast-weight Product Key Memory (FwPKM), a memory layer that can quickly learn from the current text it reads, not just from past training.

#Fast-weight memory#Product Key Memory#Sparse retrieval

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Intermediate

Taekyung Ki, Sangwon Jang et al.Jan 2arXiv

This paper builds a real-time talking-listening head avatar that reacts naturally to your words, tone, nods, and smiles in about half a second.

#interactive avatar#talking head generation#causal diffusion forcing

CPPO: Contrastive Perception for Vision Language Policy Optimization

Intermediate

Ahmad Rezaei, Mohsen Gholami et al.Jan 1arXiv

CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.

#CPPO#Contrastive Perception Loss#Vision-Language Models

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Intermediate

Shengjun Zhang, Zhang Zhang et al.Jan 1arXiv

This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.

#E-GRPO#Group Relative Policy Optimization#Flow Matching

Deep Delta Learning

Intermediate

Yifan Zhang, Yifeng Liu et al.Jan 1arXiv

Deep Delta Learning (DDL) replaces the usual “add the shortcut” rule in deep networks with a smarter, learnable move that can gently erase old info and write new info along a chosen direction.

#Deep Delta Learning#Delta Operator#Residual connection

MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing

Intermediate

Xiaokun Sun, Zeyu Cai et al.Jan 1arXiv

MorphAny3D is a training-free way to smoothly change one 3D object into another, even if they are totally different (like a bee into a biplane).

#3D morphing#Structured Latent#SLAT

SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Beginner

Zhening Huang, Hyeonho Jeong et al.Dec 31arXiv

SpaceTimePilot is a video AI that lets you steer both where the camera goes (space) and how the action plays (time) from one input video.

#video diffusion#space–time disentanglement#camera control

47 48 49 50 51